Dataset statistics
| Number of variables | 21 |
|---|---|
| Number of observations | 1781 |
| Missing cells | 814 |
| Missing cells (%) | 2.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 292.3 KiB |
| Average record size in memory | 168.1 B |
Variable types
| Categorical | 8 |
|---|---|
| Numeric | 13 |
URL has a high cardinality: 1781 distinct values | High cardinality |
SERVER has a high cardinality: 239 distinct values | High cardinality |
WHOIS_STATEPRO has a high cardinality: 182 distinct values | High cardinality |
WHOIS_REGDATE has a high cardinality: 891 distinct values | High cardinality |
WHOIS_UPDATED_DATE has a high cardinality: 594 distinct values | High cardinality |
URL_LENGTH is highly overall correlated with NUMBER_SPECIAL_CHARACTERS | High correlation |
NUMBER_SPECIAL_CHARACTERS is highly overall correlated with URL_LENGTH | High correlation |
TCP_CONVERSATION_EXCHANGE is highly overall correlated with DIST_REMOTE_TCP_PORT and 6 other fields | High correlation |
DIST_REMOTE_TCP_PORT is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 6 other fields | High correlation |
REMOTE_IPS is highly overall correlated with DNS_QUERY_TIMES | High correlation |
APP_BYTES is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 5 other fields | High correlation |
SOURCE_APP_PACKETS is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 6 other fields | High correlation |
REMOTE_APP_PACKETS is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 6 other fields | High correlation |
SOURCE_APP_BYTES is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 4 other fields | High correlation |
REMOTE_APP_BYTES is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 5 other fields | High correlation |
APP_PACKETS is highly overall correlated with TCP_CONVERSATION_EXCHANGE and 6 other fields | High correlation |
DNS_QUERY_TIMES is highly overall correlated with REMOTE_IPS | High correlation |
WHOIS_COUNTRY is highly overall correlated with Type | High correlation |
Type is highly overall correlated with WHOIS_COUNTRY | High correlation |
CONTENT_LENGTH has 812 (45.6%) missing values | Missing |
DIST_REMOTE_TCP_PORT is highly skewed (γ1 = 21.89093705) | Skewed |
APP_BYTES is highly skewed (γ1 = 41.9809937) | Skewed |
REMOTE_APP_BYTES is highly skewed (γ1 = 41.96456556) | Skewed |
URL is uniformly distributed | Uniform |
URL has unique values | Unique |
TCP_CONVERSATION_EXCHANGE has 657 (36.9%) zeros | Zeros |
DIST_REMOTE_TCP_PORT has 916 (51.4%) zeros | Zeros |
REMOTE_IPS has 657 (36.9%) zeros | Zeros |
APP_BYTES has 657 (36.9%) zeros | Zeros |
SOURCE_APP_PACKETS has 655 (36.8%) zeros | Zeros |
REMOTE_APP_PACKETS has 590 (33.1%) zeros | Zeros |
SOURCE_APP_BYTES has 590 (33.1%) zeros | Zeros |
REMOTE_APP_BYTES has 655 (36.8%) zeros | Zeros |
APP_PACKETS has 655 (36.8%) zeros | Zeros |
DNS_QUERY_TIMES has 976 (54.8%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-30 18:43:25.654052 |
|---|---|
| Analysis finished | 2022-11-30 18:44:07.764774 |
| Duration | 42.11 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
| Distinct | 1781 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| M0_109 | 1 |
|---|---|
| B0_999 | 1 |
| B0_2292 | 1 |
| B0_2168 | 1 |
| B0_2108 | 1 |
| Other values (1776) |
Length
| Max length | 7 |
|---|---|
| Median length | 6 |
| Mean length | 6.2453678 |
| Min length | 4 |
Characters and Unicode
| Total characters | 11123 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1781 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | M0_109 |
|---|---|
| 2nd row | B0_2314 |
| 3rd row | B0_911 |
| 4th row | B0_113 |
| 5th row | B0_403 |
Common Values
| Value | Count | Frequency (%) |
| M0_109 | 1 | 0.1% |
| B0_999 | 1 | 0.1% |
| B0_2292 | 1 | 0.1% |
| B0_2168 | 1 | 0.1% |
| B0_2108 | 1 | 0.1% |
| B0_2053 | 1 | 0.1% |
| B0_2035 | 1 | 0.1% |
| B0_1400 | 1 | 0.1% |
| B0_1297 | 1 | 0.1% |
| B0_1278 | 1 | 0.1% |
| Other values (1771) | 1771 |
Length
| Value | Count | Frequency (%) |
| m0_109 | 1 | 0.1% |
| b0_916 | 1 | 0.1% |
| b0_911 | 1 | 0.1% |
| b0_113 | 1 | 0.1% |
| b0_403 | 1 | 0.1% |
| b0_2064 | 1 | 0.1% |
| b0_462 | 1 | 0.1% |
| b0_1128 | 1 | 0.1% |
| m2_17 | 1 | 0.1% |
| m3_75 | 1 | 0.1% |
| Other values (1771) | 1771 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 2222 | |
| _ | 1781 | |
| B | 1565 | |
| 1 | 1108 | |
| 2 | 930 | |
| 3 | 563 | 5.1% |
| 4 | 554 | 5.0% |
| 6 | 447 | 4.0% |
| 5 | 441 | 4.0% |
| 7 | 433 | 3.9% |
| Other values (3) | 1079 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 7561 | |
| Connector Punctuation | 1781 | 16.0% |
| Uppercase Letter | 1781 | 16.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2222 | |
| 1 | 1108 | |
| 2 | 930 | |
| 3 | 563 | 7.4% |
| 4 | 554 | 7.3% |
| 6 | 447 | 5.9% |
| 5 | 441 | 5.8% |
| 7 | 433 | 5.7% |
| 8 | 432 | 5.7% |
| 9 | 431 | 5.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 1565 | |
| M | 216 | 12.1% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 1781 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 9342 | |
| Latin | 1781 | 16.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 2222 | |
| _ | 1781 | |
| 1 | 1108 | |
| 2 | 930 | |
| 3 | 563 | 6.0% |
| 4 | 554 | 5.9% |
| 6 | 447 | 4.8% |
| 5 | 441 | 4.7% |
| 7 | 433 | 4.6% |
| 8 | 432 | 4.6% |
Latin
| Value | Count | Frequency (%) |
| B | 1565 | |
| M | 216 | 12.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11123 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 2222 | |
| _ | 1781 | |
| B | 1565 | |
| 1 | 1108 | |
| 2 | 930 | |
| 3 | 563 | 5.1% |
| 4 | 554 | 5.0% |
| 6 | 447 | 4.0% |
| 5 | 441 | 4.0% |
| 7 | 433 | 3.9% |
| Other values (3) | 1079 |
URL_LENGTH
Real number (ℝ)
| Distinct | 142 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 56.961258 |
| Minimum | 16 |
|---|---|
| Maximum | 249 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 16 |
|---|---|
| 5-th percentile | 26 |
| Q1 | 39 |
| median | 49 |
| Q3 | 68 |
| 95-th percentile | 110 |
| Maximum | 249 |
| Range | 233 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 27.555586 |
|---|---|
| Coefficient of variation (CV) | 0.48376013 |
| Kurtosis | 5.0208838 |
| Mean | 56.961258 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 1.8026857 |
| Sum | 101448 |
| Variance | 759.3103 |
| Monotonicity | Increasing |
| Value | Count | Frequency (%) |
| 39 | 86 | 4.8% |
| 40 | 48 | 2.7% |
| 46 | 44 | 2.5% |
| 42 | 43 | 2.4% |
| 38 | 43 | 2.4% |
| 47 | 41 | 2.3% |
| 45 | 41 | 2.3% |
| 49 | 39 | 2.2% |
| 35 | 37 | 2.1% |
| 44 | 36 | 2.0% |
| Other values (132) | 1323 |
| Value | Count | Frequency (%) |
| 16 | 3 | 0.2% |
| 17 | 2 | 0.1% |
| 18 | 2 | 0.1% |
| 19 | 1 | 0.1% |
| 20 | 7 | |
| 21 | 4 | 0.2% |
| 22 | 9 | |
| 23 | 15 | |
| 24 | 14 | |
| 25 | 12 |
| Value | Count | Frequency (%) |
| 249 | 1 | |
| 234 | 1 | |
| 201 | 1 | |
| 198 | 1 | |
| 194 | 2 | |
| 183 | 1 | |
| 178 | 1 | |
| 173 | 1 | |
| 170 | 1 | |
| 169 | 1 |
NUMBER_SPECIAL_CHARACTERS
Real number (ℝ)
| Distinct | 31 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.111735 |
| Minimum | 5 |
|---|---|
| Maximum | 43 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 8 |
| median | 10 |
| Q3 | 13 |
| 95-th percentile | 20 |
| Maximum | 43 |
| Range | 38 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 4.549896 |
|---|---|
| Coefficient of variation (CV) | 0.40946765 |
| Kurtosis | 5.2617134 |
| Mean | 11.111735 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.8799729 |
| Sum | 19790 |
| Variance | 20.701553 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 274 | |
| 8 | 211 | |
| 11 | 208 | |
| 10 | 198 | |
| 7 | 159 | |
| 6 | 148 | |
| 12 | 134 | |
| 13 | 92 | 5.2% |
| 14 | 58 | 3.3% |
| 15 | 50 | 2.8% |
| Other values (21) | 249 |
| Value | Count | Frequency (%) |
| 5 | 2 | 0.1% |
| 6 | 148 | |
| 7 | 159 | |
| 8 | 211 | |
| 9 | 274 | |
| 10 | 198 | |
| 11 | 208 | |
| 12 | 134 | |
| 13 | 92 | 5.2% |
| 14 | 58 | 3.3% |
| Value | Count | Frequency (%) |
| 43 | 1 | 0.1% |
| 40 | 1 | 0.1% |
| 36 | 1 | 0.1% |
| 34 | 3 | |
| 31 | 2 | 0.1% |
| 30 | 1 | 0.1% |
| 29 | 4 | |
| 28 | 2 | 0.1% |
| 27 | 6 | |
| 26 | 7 |
CHARSET
Categorical
| Distinct | 9 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| UTF-8 | |
|---|---|
| ISO-8859-1 | |
| utf-8 | |
| us-ascii | |
| iso-8859-1 | |
| Other values (4) | 10 |
Length
| Max length | 12 |
|---|---|
| Median length | 5 |
| Mean length | 6.841662 |
| Min length | 4 |
Characters and Unicode
| Total characters | 12185 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | iso-8859-1 |
|---|---|
| 2nd row | UTF-8 |
| 3rd row | us-ascii |
| 4th row | ISO-8859-1 |
| 5th row | UTF-8 |
Common Values
| Value | Count | Frequency (%) |
| UTF-8 | 676 | |
| ISO-8859-1 | 427 | |
| utf-8 | 379 | |
| us-ascii | 155 | 8.7% |
| iso-8859-1 | 134 | 7.5% |
| None | 7 | 0.4% |
| windows-1251 | 1 | 0.1% |
| ISO-8859 | 1 | 0.1% |
| windows-1252 | 1 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| utf-8 | 1055 | |
| iso-8859-1 | 561 | |
| us-ascii | 155 | 8.7% |
| none | 7 | 0.4% |
| windows-1251 | 1 | 0.1% |
| iso-8859 | 1 | 0.1% |
| windows-1252 | 1 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 2335 | |
| 8 | 2179 | |
| U | 676 | 5.5% |
| F | 676 | 5.5% |
| T | 676 | 5.5% |
| 5 | 564 | 4.6% |
| 1 | 564 | 4.6% |
| 9 | 562 | 4.6% |
| u | 534 | 4.4% |
| s | 446 | 3.7% |
| Other values (15) | 2973 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3872 | |
| Uppercase Letter | 3319 | |
| Lowercase Letter | 2659 | |
| Dash Punctuation | 2335 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| u | 534 | |
| s | 446 | |
| i | 446 | |
| t | 379 | |
| f | 379 | |
| a | 155 | 5.8% |
| c | 155 | 5.8% |
| o | 143 | 5.4% |
| n | 9 | 0.3% |
| e | 7 | 0.3% |
| Other values (2) | 6 | 0.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 676 | |
| F | 676 | |
| T | 676 | |
| I | 428 | |
| S | 428 | |
| O | 428 | |
| N | 7 | 0.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 2179 | |
| 5 | 564 | 14.6% |
| 1 | 564 | 14.6% |
| 9 | 562 | 14.5% |
| 2 | 3 | 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2335 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6207 | |
| Latin | 5978 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| U | 676 | |
| F | 676 | |
| T | 676 | |
| u | 534 | |
| s | 446 | |
| i | 446 | |
| I | 428 | |
| S | 428 | |
| O | 428 | |
| t | 379 | |
| Other values (9) | 861 |
Common
| Value | Count | Frequency (%) |
| - | 2335 | |
| 8 | 2179 | |
| 5 | 564 | 9.1% |
| 1 | 564 | 9.1% |
| 9 | 562 | 9.1% |
| 2 | 3 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12185 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| - | 2335 | |
| 8 | 2179 | |
| U | 676 | 5.5% |
| F | 676 | 5.5% |
| T | 676 | 5.5% |
| 5 | 564 | 4.6% |
| 1 | 564 | 4.6% |
| 9 | 562 | 4.6% |
| u | 534 | 4.4% |
| s | 446 | 3.7% |
| Other values (15) | 2973 |
SERVER
Categorical
| Distinct | 239 |
|---|---|
| Distinct (%) | 13.4% |
| Missing | 1 |
| Missing (%) | 0.1% |
| Memory size | 14.0 KiB |
| Apache | |
|---|---|
| nginx | |
| None | |
| Microsoft-HTTPAPI/2.0 | |
| cloudflare-nginx | |
| Other values (234) |
Length
| Max length | 171 |
|---|---|
| Median length | 114 |
| Mean length | 13.748315 |
| Min length | 2 |
Characters and Unicode
| Total characters | 24472 |
|---|---|
| Distinct characters | 72 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 142 ? |
|---|---|
| Unique (%) | 8.0% |
Sample
| 1st row | nginx |
|---|---|
| 2nd row | Apache/2.4.10 |
| 3rd row | Microsoft-HTTPAPI/2.0 |
| 4th row | nginx |
| 5th row | None |
Common Values
| Value | Count | Frequency (%) |
| Apache | 386 | |
| nginx | 211 | 11.8% |
| None | 175 | 9.8% |
| Microsoft-HTTPAPI/2.0 | 113 | 6.3% |
| cloudflare-nginx | 94 | 5.3% |
| Microsoft-IIS/7.5 | 51 | 2.9% |
| GSE | 49 | 2.8% |
| Server | 49 | 2.8% |
| YouTubeFrontEnd | 42 | 2.4% |
| nginx/1.12.0 | 36 | 2.0% |
| Other values (229) | 574 |
Length
| Value | Count | Frequency (%) |
| apache | 387 | 16.2% |
| nginx | 216 | 9.0% |
| none | 175 | 7.3% |
| microsoft-httpapi/2.0 | 113 | 4.7% |
| cloudflare-nginx | 94 | 3.9% |
| server | 55 | 2.3% |
| centos | 52 | 2.2% |
| microsoft-iis/7.5 | 52 | 2.2% |
| unix | 52 | 2.2% |
| gse | 49 | 2.1% |
| Other values (307) | 1142 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1612 | 6.6% |
| . | 1591 | 6.5% |
| n | 1551 | 6.3% |
| o | 1058 | 4.3% |
| c | 1033 | 4.2% |
| 2 | 950 | 3.9% |
| i | 942 | 3.8% |
| / | 896 | 3.7% |
| a | 896 | 3.7% |
| A | 843 | 3.4% |
| Other values (62) | 13100 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 13548 | |
| Uppercase Letter | 3764 | 15.4% |
| Decimal Number | 3060 | 12.5% |
| Other Punctuation | 2492 | 10.2% |
| Space Separator | 611 | 2.5% |
| Dash Punctuation | 424 | 1.7% |
| Close Punctuation | 221 | 0.9% |
| Open Punctuation | 221 | 0.9% |
| Connector Punctuation | 119 | 0.5% |
| Math Symbol | 12 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1612 | |
| n | 1551 | |
| o | 1058 | 7.8% |
| c | 1033 | 7.6% |
| i | 942 | 7.0% |
| a | 896 | 6.6% |
| p | 840 | 6.2% |
| h | 738 | 5.4% |
| r | 569 | 4.2% |
| t | 565 | 4.2% |
| Other values (16) | 3744 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 843 | |
| S | 508 | |
| P | 395 | |
| T | 310 | 8.2% |
| I | 285 | 7.6% |
| M | 204 | 5.4% |
| H | 193 | 5.1% |
| N | 182 | 4.8% |
| O | 147 | 3.9% |
| E | 96 | 2.6% |
| Other values (15) | 601 |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 950 | |
| 1 | 701 | |
| 0 | 384 | |
| 5 | 239 | 7.8% |
| 4 | 207 | 6.8% |
| 3 | 157 | 5.1% |
| 7 | 109 | 3.6% |
| 8 | 108 | 3.5% |
| 6 | 107 | 3.5% |
| 9 | 98 | 3.2% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1591 | |
| / | 896 | |
| ; | 3 | 0.1% |
| ! | 1 | < 0.1% |
| & | 1 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 611 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 424 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 221 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 221 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 119 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 12 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 17312 | |
| Common | 7160 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1612 | 9.3% |
| n | 1551 | 9.0% |
| o | 1058 | 6.1% |
| c | 1033 | 6.0% |
| i | 942 | 5.4% |
| a | 896 | 5.2% |
| A | 843 | 4.9% |
| p | 840 | 4.9% |
| h | 738 | 4.3% |
| r | 569 | 3.3% |
| Other values (41) | 7230 |
Common
| Value | Count | Frequency (%) |
| . | 1591 | |
| 2 | 950 | |
| / | 896 | |
| 1 | 701 | |
| 611 | 8.5% | |
| - | 424 | 5.9% |
| 0 | 384 | 5.4% |
| 5 | 239 | 3.3% |
| ) | 221 | 3.1% |
| ( | 221 | 3.1% |
| Other values (11) | 922 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24472 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1612 | 6.6% |
| . | 1591 | 6.5% |
| n | 1551 | 6.3% |
| o | 1058 | 4.3% |
| c | 1033 | 4.2% |
| 2 | 950 | 3.9% |
| i | 942 | 3.8% |
| / | 896 | 3.7% |
| a | 896 | 3.7% |
| A | 843 | 3.4% |
| Other values (62) | 13100 |
CONTENT_LENGTH
Real number (ℝ)
| Distinct | 637 |
|---|---|
| Distinct (%) | 65.7% |
| Missing | 812 |
| Missing (%) | 45.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11726.928 |
| Minimum | 0 |
|---|---|
| Maximum | 649263 |
| Zeros | 5 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 162 |
| Q1 | 324 |
| median | 1853 |
| Q3 | 11323 |
| 95-th percentile | 44319.6 |
| Maximum | 649263 |
| Range | 649263 |
| Interquartile range (IQR) | 10999 |
Descriptive statistics
| Standard deviation | 36391.809 |
|---|---|
| Coefficient of variation (CV) | 3.1032688 |
| Kurtosis | 144.64797 |
| Mean | 11726.928 |
| Median Absolute Deviation (MAD) | 1691 |
| Skewness | 10.571179 |
| Sum | 11363393 |
| Variance | 1.3243638 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 324 | 138 | 7.7% |
| 1819 | 20 | 1.1% |
| 2516 | 13 | 0.7% |
| 162 | 12 | 0.7% |
| 345 | 11 | 0.6% |
| 6748 | 8 | 0.4% |
| 640 | 7 | 0.4% |
| 11 | 7 | 0.4% |
| 257 | 7 | 0.4% |
| 34 | 6 | 0.3% |
| Other values (627) | 740 | |
| (Missing) | 812 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 9 | 6 | |
| 11 | 7 | |
| 13 | 2 | 0.1% |
| 20 | 2 | 0.1% |
| 21 | 1 | 0.1% |
| 26 | 1 | 0.1% |
| 34 | 6 | |
| 39 | 3 | |
| 57 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 649263 | 1 | |
| 435494 | 1 | |
| 420762 | 1 | |
| 359174 | 1 | |
| 256306 | 1 | |
| 246324 | 1 | |
| 208082 | 1 | |
| 135444 | 1 | |
| 124140 | 1 | |
| 121211 | 1 |
WHOIS_COUNTRY
Categorical
| Distinct | 49 |
|---|---|
| Distinct (%) | 2.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| US | |
|---|---|
| None | |
| CA | 84 |
| ES | 63 |
| AU | 35 |
| Other values (44) |
Length
| Max length | 14 |
|---|---|
| Median length | 2 |
| Mean length | 2.3885458 |
| Min length | 2 |
Characters and Unicode
| Total characters | 4254 |
|---|---|
| Distinct characters | 40 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 11 ? |
|---|---|
| Unique (%) | 0.6% |
Sample
| 1st row | None |
|---|---|
| 2nd row | None |
| 3rd row | None |
| 4th row | US |
| 5th row | US |
Common Values
| Value | Count | Frequency (%) |
| US | 1103 | |
| None | 306 | 17.2% |
| CA | 84 | 4.7% |
| ES | 63 | 3.5% |
| AU | 35 | 2.0% |
| PA | 21 | 1.2% |
| GB | 19 | 1.1% |
| JP | 11 | 0.6% |
| CN | 10 | 0.6% |
| IN | 10 | 0.6% |
| Other values (39) | 119 | 6.7% |
Length
| Value | Count | Frequency (%) |
| us | 1106 | |
| none | 306 | 17.1% |
| ca | 84 | 4.7% |
| es | 63 | 3.5% |
| au | 35 | 2.0% |
| pa | 21 | 1.2% |
| gb | 19 | 1.1% |
| jp | 11 | 0.6% |
| cn | 10 | 0.6% |
| in | 10 | 0.6% |
| Other values (38) | 122 | 6.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 1178 | |
| U | 1162 | |
| N | 334 | 7.9% |
| n | 308 | 7.2% |
| e | 308 | 7.2% |
| o | 307 | 7.2% |
| A | 147 | 3.5% |
| C | 114 | 2.7% |
| E | 74 | 1.7% |
| P | 37 | 0.9% |
| Other values (30) | 285 | 6.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 3248 | |
| Lowercase Letter | 965 | 22.7% |
| Other Punctuation | 25 | 0.6% |
| Space Separator | 6 | 0.1% |
| Open Punctuation | 5 | 0.1% |
| Close Punctuation | 5 | 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1178 | |
| U | 1162 | |
| N | 334 | 10.3% |
| A | 147 | 4.5% |
| C | 114 | 3.5% |
| E | 74 | 2.3% |
| P | 37 | 1.1% |
| B | 34 | 1.0% |
| K | 30 | 0.9% |
| G | 27 | 0.8% |
| Other values (12) | 111 | 3.4% |
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 308 | |
| e | 308 | |
| o | 307 | |
| u | 19 | 2.0% |
| s | 6 | 0.6% |
| r | 6 | 0.6% |
| y | 2 | 0.2% |
| p | 2 | 0.2% |
| i | 2 | 0.2% |
| d | 2 | 0.2% |
| Other values (3) | 3 | 0.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 20 | |
| ; | 5 | 20.0% |
Space Separator
| Value | Count | Frequency (%) |
| 6 |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 5 |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 5 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4213 | |
| Common | 41 | 1.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 1178 | |
| U | 1162 | |
| N | 334 | 7.9% |
| n | 308 | 7.3% |
| e | 308 | 7.3% |
| o | 307 | 7.3% |
| A | 147 | 3.5% |
| C | 114 | 2.7% |
| E | 74 | 1.8% |
| P | 37 | 0.9% |
| Other values (25) | 244 | 5.8% |
Common
| Value | Count | Frequency (%) |
| ' | 20 | |
| 6 | 14.6% | |
| [ | 5 | 12.2% |
| ] | 5 | 12.2% |
| ; | 5 | 12.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4254 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 1178 | |
| U | 1162 | |
| N | 334 | 7.9% |
| n | 308 | 7.2% |
| e | 308 | 7.2% |
| o | 307 | 7.2% |
| A | 147 | 3.5% |
| C | 114 | 2.7% |
| E | 74 | 1.7% |
| P | 37 | 0.9% |
| Other values (30) | 285 | 6.7% |
WHOIS_STATEPRO
Categorical
| Distinct | 182 |
|---|---|
| Distinct (%) | 10.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| CA | |
|---|---|
| None | |
| NY | 75 |
| WA | 65 |
| Barcelona | 62 |
| Other values (177) |
Length
| Max length | 20 |
|---|---|
| Median length | 2 |
| Mean length | 4.032566 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7182 |
|---|---|
| Distinct characters | 61 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 91 ? |
|---|---|
| Unique (%) | 5.1% |
Sample
| 1st row | None |
|---|---|
| 2nd row | None |
| 3rd row | None |
| 4th row | AK |
| 5th row | TX |
Common Values
| Value | Count | Frequency (%) |
| CA | 372 | |
| None | 362 | |
| NY | 75 | 4.2% |
| WA | 65 | 3.6% |
| Barcelona | 62 | 3.5% |
| FL | 61 | 3.4% |
| Arizona | 58 | 3.3% |
| California | 57 | 3.2% |
| ON | 45 | 2.5% |
| NV | 30 | 1.7% |
| Other values (172) | 594 |
Length
| Value | Count | Frequency (%) |
| ca | 376 | |
| none | 363 | |
| ny | 76 | 4.2% |
| wa | 65 | 3.5% |
| barcelona | 62 | 3.4% |
| fl | 61 | 3.3% |
| arizona | 58 | 3.2% |
| california | 58 | 3.2% |
| on | 45 | 2.5% |
| nv | 30 | 1.6% |
| Other values (161) | 637 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 710 | 9.9% |
| A | 697 | 9.7% |
| o | 676 | 9.4% |
| N | 612 | 8.5% |
| e | 575 | 8.0% |
| a | 507 | 7.1% |
| C | 496 | 6.9% |
| i | 319 | 4.4% |
| r | 265 | 3.7% |
| l | 192 | 2.7% |
| Other values (51) | 2133 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4039 | |
| Uppercase Letter | 3069 | |
| Space Separator | 50 | 0.7% |
| Decimal Number | 12 | 0.2% |
| Dash Punctuation | 8 | 0.1% |
| Other Punctuation | 4 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 710 | |
| o | 676 | |
| e | 575 | |
| a | 507 | |
| i | 319 | |
| r | 265 | 6.6% |
| l | 192 | 4.8% |
| s | 135 | 3.3% |
| c | 100 | 2.5% |
| t | 71 | 1.8% |
| Other values (16) | 489 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 697 | |
| N | 612 | |
| C | 496 | |
| O | 142 | 4.6% |
| M | 107 | 3.5% |
| L | 107 | 3.5% |
| W | 99 | 3.2% |
| Y | 91 | 3.0% |
| B | 81 | 2.6% |
| T | 77 | 2.5% |
| Other values (16) | 560 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 7 | |
| 0 | 2 | 16.7% |
| 3 | 1 | 8.3% |
| 6 | 1 | 8.3% |
| 2 | 1 | 8.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3 | |
| @ | 1 | 25.0% |
Space Separator
| Value | Count | Frequency (%) |
| 50 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 8 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 7108 | |
| Common | 74 | 1.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 710 | 10.0% |
| A | 697 | 9.8% |
| o | 676 | 9.5% |
| N | 612 | 8.6% |
| e | 575 | 8.1% |
| a | 507 | 7.1% |
| C | 496 | 7.0% |
| i | 319 | 4.5% |
| r | 265 | 3.7% |
| l | 192 | 2.7% |
| Other values (42) | 2059 |
Common
| Value | Count | Frequency (%) |
| 50 | ||
| - | 8 | 10.8% |
| 1 | 7 | 9.5% |
| . | 3 | 4.1% |
| 0 | 2 | 2.7% |
| @ | 1 | 1.4% |
| 3 | 1 | 1.4% |
| 6 | 1 | 1.4% |
| 2 | 1 | 1.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7182 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 710 | 9.9% |
| A | 697 | 9.7% |
| o | 676 | 9.4% |
| N | 612 | 8.5% |
| e | 575 | 8.0% |
| a | 507 | 7.1% |
| C | 496 | 6.9% |
| i | 319 | 4.4% |
| r | 265 | 3.7% |
| l | 192 | 2.7% |
| Other values (51) | 2133 |
WHOIS_REGDATE
Categorical
| Distinct | 891 |
|---|---|
| Distinct (%) | 50.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| None | 127 |
|---|---|
| 17/09/2008 0:00 | 62 |
| 13/01/2001 0:12 | 59 |
| 31/07/2000 0:00 | 47 |
| 15/02/2005 0:00 | 41 |
| Other values (886) |
Length
| Max length | 22 |
|---|---|
| Median length | 15 |
| Mean length | 13.982594 |
| Min length | 1 |
Characters and Unicode
| Total characters | 24903 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 701 ? |
|---|---|
| Unique (%) | 39.4% |
Sample
| 1st row | 10/10/2015 18:21 |
|---|---|
| 2nd row | None |
| 3rd row | None |
| 4th row | 7/10/1997 4:00 |
| 5th row | 12/05/1996 0:00 |
Common Values
| Value | Count | Frequency (%) |
| None | 127 | 7.1% |
| 17/09/2008 0:00 | 62 | 3.5% |
| 13/01/2001 0:12 | 59 | 3.3% |
| 31/07/2000 0:00 | 47 | 2.6% |
| 15/02/2005 0:00 | 41 | 2.3% |
| 29/03/1997 0:00 | 33 | 1.9% |
| 1/11/1994 0:00 | 30 | 1.7% |
| 18/01/1995 0:00 | 25 | 1.4% |
| 2/11/2002 0:00 | 21 | 1.2% |
| 16/05/1995 0:00 | 17 | 1.0% |
| Other values (881) | 1319 |
Length
| Value | Count | Frequency (%) |
| 0:00 | 1470 | |
| none | 127 | 3.7% |
| 17/09/2008 | 62 | 1.8% |
| 13/01/2001 | 59 | 1.7% |
| 0:12 | 59 | 1.7% |
| 31/07/2000 | 47 | 1.4% |
| 15/02/2005 | 41 | 1.2% |
| 29/03/1997 | 33 | 1.0% |
| 1/11/1994 | 30 | 0.9% |
| 18/01/1995 | 25 | 0.7% |
| Other values (941) | 1474 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 8294 | |
| / | 3292 | 13.2% |
| 1 | 2485 | 10.0% |
| 2 | 2196 | 8.8% |
| 9 | 1701 | 6.8% |
| : | 1656 | 6.6% |
| 1646 | 6.6% | |
| 3 | 605 | 2.4% |
| 5 | 592 | 2.4% |
| 7 | 490 | 2.0% |
| Other values (12) | 1946 | 7.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 17774 | |
| Other Punctuation | 4953 | 19.9% |
| Space Separator | 1646 | 6.6% |
| Lowercase Letter | 383 | 1.5% |
| Uppercase Letter | 137 | 0.6% |
| Dash Punctuation | 10 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 8294 | |
| 1 | 2485 | 14.0% |
| 2 | 2196 | 12.4% |
| 9 | 1701 | 9.6% |
| 3 | 605 | 3.4% |
| 5 | 592 | 3.3% |
| 7 | 490 | 2.8% |
| 8 | 484 | 2.7% |
| 6 | 480 | 2.7% |
| 4 | 447 | 2.5% |
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 127 | |
| e | 127 | |
| n | 127 | |
| b | 2 | 0.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 3292 | |
| : | 1656 | |
| . | 5 | 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 127 | |
| T | 5 | 3.6% |
| Z | 5 | 3.6% |
Space Separator
| Value | Count | Frequency (%) |
| 1646 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 10 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 24383 | |
| Latin | 520 | 2.1% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 8294 | |
| / | 3292 | 13.5% |
| 1 | 2485 | 10.2% |
| 2 | 2196 | 9.0% |
| 9 | 1701 | 7.0% |
| : | 1656 | 6.8% |
| 1646 | 6.8% | |
| 3 | 605 | 2.5% |
| 5 | 592 | 2.4% |
| 7 | 490 | 2.0% |
| Other values (5) | 1426 | 5.8% |
Latin
| Value | Count | Frequency (%) |
| N | 127 | |
| o | 127 | |
| e | 127 | |
| n | 127 | |
| T | 5 | 1.0% |
| Z | 5 | 1.0% |
| b | 2 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24903 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 8294 | |
| / | 3292 | 13.2% |
| 1 | 2485 | 10.0% |
| 2 | 2196 | 8.8% |
| 9 | 1701 | 6.8% |
| : | 1656 | 6.6% |
| 1646 | 6.6% | |
| 3 | 605 | 2.4% |
| 5 | 592 | 2.4% |
| 7 | 490 | 2.0% |
| Other values (12) | 1946 | 7.8% |
WHOIS_UPDATED_DATE
Categorical
| Distinct | 594 |
|---|---|
| Distinct (%) | 33.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| None | 139 |
|---|---|
| 2/09/2016 0:00 | 64 |
| 12/12/2015 10:16 | 59 |
| 29/06/2016 0:00 | 47 |
| 14/01/2017 0:00 | 42 |
| Other values (589) |
Length
| Max length | 22 |
|---|---|
| Median length | 15 |
| Mean length | 13.947221 |
| Min length | 4 |
Characters and Unicode
| Total characters | 24840 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 341 ? |
|---|---|
| Unique (%) | 19.1% |
Sample
| 1st row | None |
|---|---|
| 2nd row | None |
| 3rd row | None |
| 4th row | 12/09/2013 0:45 |
| 5th row | 11/04/2017 0:00 |
Common Values
| Value | Count | Frequency (%) |
| None | 139 | 7.8% |
| 2/09/2016 0:00 | 64 | 3.6% |
| 12/12/2015 10:16 | 59 | 3.3% |
| 29/06/2016 0:00 | 47 | 2.6% |
| 14/01/2017 0:00 | 42 | 2.4% |
| 29/11/2016 0:00 | 36 | 2.0% |
| 26/08/2015 0:00 | 31 | 1.7% |
| 21/10/2016 0:00 | 30 | 1.7% |
| 30/04/2014 0:00 | 29 | 1.6% |
| 3/03/2017 0:00 | 27 | 1.5% |
| Other values (584) | 1277 |
Length
| Value | Count | Frequency (%) |
| 0:00 | 1472 | |
| none | 139 | 4.1% |
| 2/09/2016 | 64 | 1.9% |
| 12/12/2015 | 59 | 1.7% |
| 10:16 | 59 | 1.7% |
| 29/06/2016 | 47 | 1.4% |
| 14/01/2017 | 43 | 1.3% |
| 29/11/2016 | 36 | 1.1% |
| 26/08/2015 | 31 | 0.9% |
| 21/10/2016 | 30 | 0.9% |
| Other values (617) | 1438 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 7651 | |
| / | 3274 | |
| 1 | 3203 | |
| 2 | 2844 | 11.4% |
| : | 1647 | 6.6% |
| 1637 | 6.6% | |
| 6 | 1108 | 4.5% |
| 7 | 668 | 2.7% |
| 5 | 560 | 2.3% |
| 4 | 515 | 2.1% |
| Other values (11) | 1733 | 7.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 17701 | |
| Other Punctuation | 4926 | 19.8% |
| Space Separator | 1637 | 6.6% |
| Lowercase Letter | 417 | 1.7% |
| Uppercase Letter | 149 | 0.6% |
| Dash Punctuation | 10 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 7651 | |
| 1 | 3203 | |
| 2 | 2844 | 16.1% |
| 6 | 1108 | 6.3% |
| 7 | 668 | 3.8% |
| 5 | 560 | 3.2% |
| 4 | 515 | 2.9% |
| 3 | 505 | 2.9% |
| 9 | 366 | 2.1% |
| 8 | 281 | 1.6% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 3274 | |
| : | 1647 | |
| . | 5 | 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 139 | |
| T | 5 | 3.4% |
| Z | 5 | 3.4% |
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 139 | |
| e | 139 | |
| n | 139 |
Space Separator
| Value | Count | Frequency (%) |
| 1637 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 10 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 24274 | |
| Latin | 566 | 2.3% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 7651 | |
| / | 3274 | |
| 1 | 3203 | |
| 2 | 2844 | 11.7% |
| : | 1647 | 6.8% |
| 1637 | 6.7% | |
| 6 | 1108 | 4.6% |
| 7 | 668 | 2.8% |
| 5 | 560 | 2.3% |
| 4 | 515 | 2.1% |
| Other values (5) | 1167 | 4.8% |
Latin
| Value | Count | Frequency (%) |
| N | 139 | |
| o | 139 | |
| e | 139 | |
| n | 139 | |
| T | 5 | 0.9% |
| Z | 5 | 0.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24840 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 7651 | |
| / | 3274 | |
| 1 | 3203 | |
| 2 | 2844 | 11.4% |
| : | 1647 | 6.6% |
| 1637 | 6.6% | |
| 6 | 1108 | 4.5% |
| 7 | 668 | 2.7% |
| 5 | 560 | 2.3% |
| 4 | 515 | 2.1% |
| Other values (11) | 1733 | 7.0% |
| Distinct | 103 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.261089 |
| Minimum | 0 |
|---|---|
| Maximum | 1194 |
| Zeros | 657 |
| Zeros (%) | 36.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 7 |
| Q3 | 22 |
| 95-th percentile | 55 |
| Maximum | 1194 |
| Range | 1194 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 40.500975 |
|---|---|
| Coefficient of variation (CV) | 2.490668 |
| Kurtosis | 453.42612 |
| Mean | 16.261089 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 17.609832 |
| Sum | 28961 |
| Variance | 1640.329 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 657 | |
| 7 | 56 | 3.1% |
| 8 | 53 | 3.0% |
| 5 | 48 | 2.7% |
| 4 | 48 | 2.7% |
| 6 | 39 | 2.2% |
| 15 | 36 | 2.0% |
| 10 | 35 | 2.0% |
| 12 | 32 | 1.8% |
| 9 | 31 | 1.7% |
| Other values (93) | 746 |
| Value | Count | Frequency (%) |
| 0 | 657 | |
| 1 | 16 | 0.9% |
| 2 | 13 | 0.7% |
| 3 | 28 | 1.6% |
| 4 | 48 | 2.7% |
| 5 | 48 | 2.7% |
| 6 | 39 | 2.2% |
| 7 | 56 | 3.1% |
| 8 | 53 | 3.0% |
| 9 | 31 | 1.7% |
| Value | Count | Frequency (%) |
| 1194 | 1 | |
| 709 | 1 | |
| 326 | 1 | |
| 288 | 1 | |
| 226 | 1 | |
| 208 | 2 | |
| 197 | 1 | |
| 188 | 1 | |
| 185 | 1 | |
| 157 | 1 |
| Distinct | 66 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.4727681 |
| Minimum | 0 |
|---|---|
| Maximum | 708 |
| Zeros | 916 |
| Zeros (%) | 51.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 5 |
| 95-th percentile | 26 |
| Maximum | 708 |
| Range | 708 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 21.807327 |
|---|---|
| Coefficient of variation (CV) | 3.9846978 |
| Kurtosis | 642.40018 |
| Mean | 5.4727681 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 21.890937 |
| Sum | 9747 |
| Variance | 475.55951 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 916 | |
| 3 | 151 | 8.5% |
| 1 | 106 | 6.0% |
| 2 | 78 | 4.4% |
| 4 | 69 | 3.9% |
| 6 | 60 | 3.4% |
| 5 | 53 | 3.0% |
| 7 | 49 | 2.8% |
| 9 | 31 | 1.7% |
| 8 | 26 | 1.5% |
| Other values (56) | 242 | 13.6% |
| Value | Count | Frequency (%) |
| 0 | 916 | |
| 1 | 106 | 6.0% |
| 2 | 78 | 4.4% |
| 3 | 151 | 8.5% |
| 4 | 69 | 3.9% |
| 5 | 53 | 3.0% |
| 6 | 60 | 3.4% |
| 7 | 49 | 2.8% |
| 8 | 26 | 1.5% |
| 9 | 31 | 1.7% |
| Value | Count | Frequency (%) |
| 708 | 1 | |
| 317 | 1 | |
| 279 | 1 | |
| 98 | 1 | |
| 89 | 1 | |
| 73 | 1 | |
| 67 | 1 | |
| 60 | 1 | |
| 59 | 1 | |
| 58 | 2 |
| Distinct | 18 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.0606401 |
| Minimum | 0 |
|---|---|
| Maximum | 17 |
| Zeros | 657 |
| Zeros (%) | 36.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 5 |
| 95-th percentile | 10 |
| Maximum | 17 |
| Range | 17 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.3869753 |
|---|---|
| Coefficient of variation (CV) | 1.1066232 |
| Kurtosis | 0.78900501 |
| Mean | 3.0606401 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.1303713 |
| Sum | 5451 |
| Variance | 11.471602 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 657 | |
| 2 | 191 | 10.7% |
| 3 | 183 | 10.3% |
| 5 | 134 | 7.5% |
| 4 | 128 | 7.2% |
| 1 | 101 | 5.7% |
| 6 | 97 | 5.4% |
| 7 | 80 | 4.5% |
| 8 | 67 | 3.8% |
| 9 | 42 | 2.4% |
| Other values (8) | 101 | 5.7% |
| Value | Count | Frequency (%) |
| 0 | 657 | |
| 1 | 101 | 5.7% |
| 2 | 191 | 10.7% |
| 3 | 183 | 10.3% |
| 4 | 128 | 7.2% |
| 5 | 134 | 7.5% |
| 6 | 97 | 5.4% |
| 7 | 80 | 4.5% |
| 8 | 67 | 3.8% |
| 9 | 42 | 2.4% |
| Value | Count | Frequency (%) |
| 17 | 1 | 0.1% |
| 16 | 2 | 0.1% |
| 15 | 5 | 0.3% |
| 14 | 9 | 0.5% |
| 13 | 8 | 0.4% |
| 12 | 21 | 1.2% |
| 11 | 25 | 1.4% |
| 10 | 30 | |
| 9 | 42 | |
| 8 | 67 |
| Distinct | 825 |
|---|---|
| Distinct (%) | 46.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2982.3391 |
| Minimum | 0 |
|---|---|
| Maximum | 2362906 |
| Zeros | 657 |
| Zeros (%) | 36.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 672 |
| Q3 | 2328 |
| 95-th percentile | 6235 |
| Maximum | 2362906 |
| Range | 2362906 |
| Interquartile range (IQR) | 2328 |
Descriptive statistics
| Standard deviation | 56050.575 |
|---|---|
| Coefficient of variation (CV) | 18.794165 |
| Kurtosis | 1768.3952 |
| Mean | 2982.3391 |
| Median Absolute Deviation (MAD) | 672 |
| Skewness | 41.980994 |
| Sum | 5311546 |
| Variance | 3.1416669 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 657 | |
| 432 | 20 | 1.1% |
| 366 | 20 | 1.1% |
| 498 | 18 | 1.0% |
| 564 | 12 | 0.7% |
| 474 | 11 | 0.6% |
| 696 | 11 | 0.6% |
| 486 | 9 | 0.5% |
| 420 | 9 | 0.5% |
| 618 | 8 | 0.4% |
| Other values (815) | 1006 |
| Value | Count | Frequency (%) |
| 0 | 657 | |
| 54 | 1 | 0.1% |
| 66 | 6 | 0.3% |
| 90 | 8 | 0.4% |
| 128 | 1 | 0.1% |
| 132 | 3 | 0.2% |
| 198 | 3 | 0.2% |
| 202 | 1 | 0.1% |
| 238 | 1 | 0.1% |
| 264 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 2362906 | 1 | |
| 99843 | 1 | |
| 26631 | 1 | |
| 23383 | 1 | |
| 20749 | 1 | |
| 20074 | 1 | |
| 18084 | 1 | |
| 15162 | 1 | |
| 14530 | 1 | |
| 14064 | 1 |
| Distinct | 113 |
|---|---|
| Distinct (%) | 6.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.540146 |
| Minimum | 0 |
|---|---|
| Maximum | 1198 |
| Zeros | 655 |
| Zeros (%) | 36.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 8 |
| Q3 | 26 |
| 95-th percentile | 61 |
| Maximum | 1198 |
| Range | 1198 |
| Interquartile range (IQR) | 26 |
Descriptive statistics
| Standard deviation | 41.627173 |
|---|---|
| Coefficient of variation (CV) | 2.2452452 |
| Kurtosis | 407.84776 |
| Mean | 18.540146 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 16.308302 |
| Sum | 33020 |
| Variance | 1732.8216 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 655 | |
| 5 | 43 | 2.4% |
| 4 | 42 | 2.4% |
| 8 | 38 | 2.1% |
| 6 | 38 | 2.1% |
| 11 | 35 | 2.0% |
| 14 | 34 | 1.9% |
| 7 | 32 | 1.8% |
| 23 | 31 | 1.7% |
| 16 | 28 | 1.6% |
| Other values (103) | 805 |
| Value | Count | Frequency (%) |
| 0 | 655 | |
| 1 | 15 | 0.8% |
| 2 | 14 | 0.8% |
| 3 | 25 | 1.4% |
| 4 | 42 | 2.4% |
| 5 | 43 | 2.4% |
| 6 | 38 | 2.1% |
| 7 | 32 | 1.8% |
| 8 | 38 | 2.1% |
| 9 | 25 | 1.4% |
| Value | Count | Frequency (%) |
| 1198 | 1 | |
| 709 | 1 | |
| 330 | 1 | |
| 294 | 1 | |
| 228 | 2 | |
| 210 | 1 | |
| 200 | 1 | |
| 194 | 1 | |
| 187 | 1 | |
| 162 | 1 |
| Distinct | 116 |
|---|---|
| Distinct (%) | 6.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.74621 |
| Minimum | 0 |
|---|---|
| Maximum | 1284 |
| Zeros | 590 |
| Zeros (%) | 33.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 9 |
| Q3 | 25 |
| 95-th percentile | 62 |
| Maximum | 1284 |
| Range | 1284 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 46.397969 |
|---|---|
| Coefficient of variation (CV) | 2.4750586 |
| Kurtosis | 373.60708 |
| Mean | 18.74621 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 15.995364 |
| Sum | 33387 |
| Variance | 2152.7715 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 590 | |
| 5 | 48 | 2.7% |
| 4 | 48 | 2.7% |
| 2 | 43 | 2.4% |
| 9 | 42 | 2.4% |
| 3 | 42 | 2.4% |
| 7 | 40 | 2.2% |
| 12 | 40 | 2.2% |
| 6 | 40 | 2.2% |
| 10 | 39 | 2.2% |
| Other values (106) | 809 |
| Value | Count | Frequency (%) |
| 0 | 590 | |
| 1 | 1 | 0.1% |
| 2 | 43 | 2.4% |
| 3 | 42 | 2.4% |
| 4 | 48 | 2.7% |
| 5 | 48 | 2.7% |
| 6 | 40 | 2.2% |
| 7 | 40 | 2.2% |
| 8 | 35 | 2.0% |
| 9 | 42 | 2.4% |
| Value | Count | Frequency (%) |
| 1284 | 1 | |
| 837 | 1 | |
| 442 | 1 | |
| 431 | 1 | |
| 284 | 1 | |
| 278 | 1 | |
| 263 | 1 | |
| 255 | 1 | |
| 217 | 1 | |
| 216 | 1 |
| Distinct | 885 |
|---|---|
| Distinct (%) | 49.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15892.546 |
| Minimum | 0 |
|---|---|
| Maximum | 2060012 |
| Zeros | 590 |
| Zeros (%) | 33.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 579 |
| Q3 | 9806 |
| 95-th percentile | 66124 |
| Maximum | 2060012 |
| Range | 2060012 |
| Interquartile range (IQR) | 9806 |
Descriptive statistics
| Standard deviation | 69861.93 |
|---|---|
| Coefficient of variation (CV) | 4.395893 |
| Kurtosis | 460.95739 |
| Mean | 15892.546 |
| Median Absolute Deviation (MAD) | 579 |
| Skewness | 18.275493 |
| Sum | 28304624 |
| Variance | 4.8806892 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 590 | |
| 124 | 43 | 2.4% |
| 244 | 31 | 1.7% |
| 186 | 29 | 1.6% |
| 306 | 24 | 1.3% |
| 442 | 10 | 0.6% |
| 562 | 9 | 0.5% |
| 310 | 9 | 0.5% |
| 438 | 9 | 0.5% |
| 372 | 8 | 0.4% |
| Other values (875) | 1019 |
| Value | Count | Frequency (%) |
| 0 | 590 | |
| 62 | 1 | 0.1% |
| 124 | 43 | 2.4% |
| 182 | 1 | 0.1% |
| 184 | 6 | 0.3% |
| 186 | 29 | 1.6% |
| 190 | 5 | 0.3% |
| 213 | 1 | 0.1% |
| 244 | 31 | 1.7% |
| 246 | 7 | 0.4% |
| Value | Count | Frequency (%) |
| 2060012 | 1 | |
| 1058608 | 1 | |
| 947971 | 1 | |
| 488313 | 1 | |
| 486769 | 1 | |
| 466055 | 1 | |
| 383760 | 1 | |
| 298694 | 1 | |
| 295213 | 1 | |
| 284743 | 1 |
| Distinct | 822 |
|---|---|
| Distinct (%) | 46.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3155.5985 |
| Minimum | 0 |
|---|---|
| Maximum | 2362906 |
| Zeros | 655 |
| Zeros (%) | 36.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 735 |
| Q3 | 2701 |
| 95-th percentile | 6657 |
| Maximum | 2362906 |
| Range | 2362906 |
| Interquartile range (IQR) | 2701 |
Descriptive statistics
| Standard deviation | 56053.78 |
|---|---|
| Coefficient of variation (CV) | 17.76328 |
| Kurtosis | 1767.47 |
| Mean | 3155.5985 |
| Median Absolute Deviation (MAD) | 735 |
| Skewness | 41.964566 |
| Sum | 5620121 |
| Variance | 3.1420263 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 655 | |
| 366 | 20 | 1.1% |
| 432 | 20 | 1.1% |
| 498 | 18 | 1.0% |
| 474 | 12 | 0.7% |
| 564 | 12 | 0.7% |
| 696 | 11 | 0.6% |
| 276 | 9 | 0.5% |
| 420 | 9 | 0.5% |
| 486 | 9 | 0.5% |
| Other values (812) | 1006 |
| Value | Count | Frequency (%) |
| 0 | 655 | |
| 54 | 1 | 0.1% |
| 66 | 5 | 0.3% |
| 90 | 8 | 0.4% |
| 132 | 3 | 0.2% |
| 146 | 2 | 0.1% |
| 198 | 3 | 0.2% |
| 202 | 1 | 0.1% |
| 206 | 1 | 0.1% |
| 264 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 2362906 | 1 | |
| 100151 | 1 | |
| 26931 | 1 | |
| 23877 | 1 | |
| 21646 | 1 | |
| 21187 | 1 | |
| 18384 | 1 | |
| 15314 | 1 | |
| 14688 | 1 | |
| 14522 | 1 |
| Distinct | 113 |
|---|---|
| Distinct (%) | 6.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.540146 |
| Minimum | 0 |
|---|---|
| Maximum | 1198 |
| Zeros | 655 |
| Zeros (%) | 36.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 8 |
| Q3 | 26 |
| 95-th percentile | 61 |
| Maximum | 1198 |
| Range | 1198 |
| Interquartile range (IQR) | 26 |
Descriptive statistics
| Standard deviation | 41.627173 |
|---|---|
| Coefficient of variation (CV) | 2.2452452 |
| Kurtosis | 407.84776 |
| Mean | 18.540146 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 16.308302 |
| Sum | 33020 |
| Variance | 1732.8216 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 655 | |
| 5 | 43 | 2.4% |
| 4 | 42 | 2.4% |
| 8 | 38 | 2.1% |
| 6 | 38 | 2.1% |
| 11 | 35 | 2.0% |
| 14 | 34 | 1.9% |
| 7 | 32 | 1.8% |
| 23 | 31 | 1.7% |
| 16 | 28 | 1.6% |
| Other values (103) | 805 |
| Value | Count | Frequency (%) |
| 0 | 655 | |
| 1 | 15 | 0.8% |
| 2 | 14 | 0.8% |
| 3 | 25 | 1.4% |
| 4 | 42 | 2.4% |
| 5 | 43 | 2.4% |
| 6 | 38 | 2.1% |
| 7 | 32 | 1.8% |
| 8 | 38 | 2.1% |
| 9 | 25 | 1.4% |
| Value | Count | Frequency (%) |
| 1198 | 1 | |
| 709 | 1 | |
| 330 | 1 | |
| 294 | 1 | |
| 228 | 2 | |
| 210 | 1 | |
| 200 | 1 | |
| 194 | 1 | |
| 187 | 1 | |
| 162 | 1 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 1 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.2634831 |
| Minimum | 0 |
|---|---|
| Maximum | 20 |
| Zeros | 976 |
| Zeros (%) | 54.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 4 |
| 95-th percentile | 8 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.9308526 |
|---|---|
| Coefficient of variation (CV) | 1.2948418 |
| Kurtosis | 0.80566895 |
| Mean | 2.2634831 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.1226261 |
| Sum | 4029 |
| Variance | 8.5898968 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 976 | |
| 4 | 309 | 17.3% |
| 6 | 213 | 12.0% |
| 2 | 142 | 8.0% |
| 8 | 105 | 5.9% |
| 10 | 19 | 1.1% |
| 12 | 12 | 0.7% |
| 14 | 2 | 0.1% |
| 20 | 1 | 0.1% |
| 9 | 1 | 0.1% |
| (Missing) | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 976 | |
| 2 | 142 | 8.0% |
| 4 | 309 | 17.3% |
| 6 | 213 | 12.0% |
| 8 | 105 | 5.9% |
| 9 | 1 | 0.1% |
| 10 | 19 | 1.1% |
| 12 | 12 | 0.7% |
| 14 | 2 | 0.1% |
| 20 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 20 | 1 | 0.1% |
| 14 | 2 | 0.1% |
| 12 | 12 | 0.7% |
| 10 | 19 | 1.1% |
| 9 | 1 | 0.1% |
| 8 | 105 | 5.9% |
| 6 | 213 | 12.0% |
| 4 | 309 | 17.3% |
| 2 | 142 | 8.0% |
| 0 | 976 |
Type
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.0 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1781 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1565 | |
| 1 | 216 | 12.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 1565 | |
| 1 | 216 | 12.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1565 | |
| 1 | 216 | 12.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1781 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1565 | |
| 1 | 216 | 12.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1781 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1565 | |
| 1 | 216 | 12.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1781 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1565 | |
| 1 | 216 | 12.1% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| URL | URL_LENGTH | NUMBER_SPECIAL_CHARACTERS | CHARSET | SERVER | CONTENT_LENGTH | WHOIS_COUNTRY | WHOIS_STATEPRO | WHOIS_REGDATE | WHOIS_UPDATED_DATE | TCP_CONVERSATION_EXCHANGE | DIST_REMOTE_TCP_PORT | REMOTE_IPS | APP_BYTES | SOURCE_APP_PACKETS | REMOTE_APP_PACKETS | SOURCE_APP_BYTES | REMOTE_APP_BYTES | APP_PACKETS | DNS_QUERY_TIMES | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M0_109 | 16 | 7 | iso-8859-1 | nginx | 263.0 | None | None | 10/10/2015 18:21 | None | 7 | 0 | 2 | 700 | 9 | 10 | 1153 | 832 | 9 | 2.0 | 1 |
| 1 | B0_2314 | 16 | 6 | UTF-8 | Apache/2.4.10 | 15087.0 | None | None | None | None | 17 | 7 | 4 | 1230 | 17 | 19 | 1265 | 1230 | 17 | 0.0 | 0 |
| 2 | B0_911 | 16 | 6 | us-ascii | Microsoft-HTTPAPI/2.0 | 324.0 | None | None | None | None | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 |
| 3 | B0_113 | 17 | 6 | ISO-8859-1 | nginx | 162.0 | US | AK | 7/10/1997 4:00 | 12/09/2013 0:45 | 31 | 22 | 3 | 3812 | 39 | 37 | 18784 | 4380 | 39 | 8.0 | 0 |
| 4 | B0_403 | 17 | 6 | UTF-8 | None | 124140.0 | US | TX | 12/05/1996 0:00 | 11/04/2017 0:00 | 57 | 2 | 5 | 4278 | 61 | 62 | 129889 | 4586 | 61 | 4.0 | 0 |
| 5 | B0_2064 | 18 | 7 | UTF-8 | nginx | NaN | SC | Mahe | 3/08/2016 14:30 | 3/10/2016 3:45 | 11 | 6 | 9 | 894 | 11 | 13 | 838 | 894 | 11 | 0.0 | 0 |
| 6 | B0_462 | 18 | 6 | iso-8859-1 | Apache/2 | 345.0 | US | CO | 29/07/2002 0:00 | 1/07/2016 0:00 | 12 | 0 | 3 | 1189 | 14 | 13 | 8559 | 1327 | 14 | 2.0 | 0 |
| 7 | B0_1128 | 19 | 6 | us-ascii | Microsoft-HTTPAPI/2.0 | 324.0 | US | FL | 18/03/1997 0:00 | 19/03/2017 0:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 |
| 8 | M2_17 | 20 | 5 | utf-8 | nginx/1.10.1 | NaN | None | None | 8/11/2014 7:41 | None | 0 | 0 | 0 | 0 | 2 | 3 | 213 | 146 | 2 | 2.0 | 1 |
| 9 | M3_75 | 20 | 5 | utf-8 | nginx/1.10.1 | NaN | None | None | 8/11/2014 7:41 | None | 0 | 0 | 0 | 0 | 2 | 1 | 62 | 146 | 2 | 2.0 | 1 |
| URL | URL_LENGTH | NUMBER_SPECIAL_CHARACTERS | CHARSET | SERVER | CONTENT_LENGTH | WHOIS_COUNTRY | WHOIS_STATEPRO | WHOIS_REGDATE | WHOIS_UPDATED_DATE | TCP_CONVERSATION_EXCHANGE | DIST_REMOTE_TCP_PORT | REMOTE_IPS | APP_BYTES | SOURCE_APP_PACKETS | REMOTE_APP_PACKETS | SOURCE_APP_BYTES | REMOTE_APP_BYTES | APP_PACKETS | DNS_QUERY_TIMES | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1771 | M4_43 | 170 | 17 | UTF-8 | Apache | NaN | ES | Barcelona | 17/09/2008 0:00 | 2/09/2016 0:00 | 0 | 0 | 0 | 0 | 0 | 2 | 124 | 0 | 0 | 0.0 | 1 |
| 1772 | M4_61 | 173 | 34 | UTF-8 | Apache | NaN | ES | Barcelona | 17/09/2008 0:00 | 2/09/2016 0:00 | 1 | 1 | 1 | 90 | 1 | 5 | 416 | 90 | 1 | 0.0 | 1 |
| 1773 | M4_39 | 178 | 16 | UTF-8 | Apache | NaN | ES | Barcelona | 17/09/2008 0:00 | 2/09/2016 0:00 | 0 | 0 | 0 | 0 | 0 | 3 | 186 | 0 | 0 | 0.0 | 1 |
| 1774 | B0_156 | 183 | 29 | ISO-8859-1 | Microsoft-IIS/7.5; litigation_essentials.lexisnexis.com 9999 | 4890.0 | US | NY | 26/06/1997 0:00 | 18/11/2014 0:00 | 22 | 2 | 7 | 2062 | 30 | 26 | 8161 | 2742 | 30 | 8.0 | 0 |
| 1775 | M4_45 | 194 | 17 | UTF-8 | Apache | NaN | ES | Barcelona | 17/09/2008 0:00 | 2/09/2016 0:00 | 0 | 0 | 0 | 0 | 0 | 3 | 186 | 0 | 0 | 0.0 | 1 |
| 1776 | M4_48 | 194 | 16 | UTF-8 | Apache | NaN | ES | Barcelona | 17/09/2008 0:00 | 2/09/2016 0:00 | 0 | 0 | 0 | 0 | 0 | 3 | 186 | 0 | 0 | 0.0 | 1 |
| 1777 | M4_41 | 198 | 17 | UTF-8 | Apache | NaN | ES | Barcelona | 17/09/2008 0:00 | 2/09/2016 0:00 | 0 | 0 | 0 | 0 | 0 | 2 | 124 | 0 | 0 | 0.0 | 1 |
| 1778 | B0_162 | 201 | 34 | utf-8 | Apache/2.2.16 (Debian) | 8904.0 | US | FL | 15/02/1999 0:00 | 15/07/2015 0:00 | 83 | 2 | 6 | 6631 | 87 | 89 | 132181 | 6945 | 87 | 4.0 | 0 |
| 1779 | B0_1152 | 234 | 34 | ISO-8859-1 | cloudflare-nginx | NaN | US | CA | 1/04/1998 0:00 | 9/12/2016 0:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 |
| 1780 | B0_676 | 249 | 40 | utf-8 | Microsoft-IIS/8.5 | 24435.0 | US | Wisconsin | 14/11/2008 0:00 | 20/11/2013 0:00 | 19 | 6 | 11 | 2314 | 25 | 28 | 3039 | 2776 | 25 | 6.0 | 0 |